SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
https://arxiv.org/abs/2501.17161?utm_source=pytorchkr&ref=pytorchkr SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-trainingSupervised fine-tuning (SFT) and reinforcement learning (RL) are widely used post-training techniques for foundation models. However, their roles in enhancing model generalization capabilities remain unclear. This paper studies the difference bet..